data engineer
Redefining data engineering in the age of AI
As AI becomes central to the enterprise, data engineers are stepping out from behind the scenes to help shape AI strategy and influence business decisions. As organizations weave AI into more of their operations, senior executives are realizing data engineers hold a central role in bringing these initiatives to life. After all, AI only delivers when you have large amounts of reliable and well-managed, high-quality data. Indeed, this report finds that data engineers play a pivotal role in their organizations as enablers of AI. And in so doing, they are integral to the overall success of the business. According to the results of a survey of 400 senior data and technology executives, conducted by MIT Technology Review Insights, data engineers have become influential in areas that extend well beyond their traditional remit as pipeline managers.
- Information Technology > Communications > Social Media (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.57)
M^3Builder: A Multi-Agent System for Automated Machine Learning in Medical Imaging
Feng, Jinghao, Zheng, Qiaoyu, Wu, Chaoyi, Zhao, Ziheng, Zhang, Ya, Wang, Yanfeng, Xie, Weidi
Agentic AI systems have gained significant attention for their ability to autonomously perform complex tasks. However, their reliance on well-prepared tools limits their applicability in the medical domain, which requires to train specialized models. In this paper, we make three contributions: (i) We present M3Builder, a novel multi-agent system designed to automate machine learning (ML) in medical imaging. At its core, M3Builder employs four specialized agents that collaborate to tackle complex, multi-step medical ML workflows, from automated data processing and environment configuration to self-contained auto debugging and model training. These agents operate within a medical imaging ML workspace, a structured environment designed to provide agents with free-text descriptions of datasets, training codes, and interaction tools, enabling seamless communication and task execution. (ii) To evaluate progress in automated medical imaging ML, we propose M3Bench, a benchmark comprising four general tasks on 14 training datasets, across five anatomies and three imaging modalities, covering both 2D and 3D data. (iii) We experiment with seven state-of-the-art large language models serving as agent cores for our system, such as Claude series, GPT-4o, and DeepSeek-V3. Compared to existing ML agentic designs, M3Builder shows superior performance on completing ML tasks in medical imaging, achieving a 94.29% success rate using Claude-3.7-Sonnet as the agent core, showing huge potential towards fully automated machine learning in medical imaging.
- Health & Medicine > Health Care Technology (1.00)
- Health & Medicine > Diagnostic Medicine > Imaging (1.00)
The Role of Accuracy and Validation Effectiveness in Conversational Business Analytics
This study examines conversational business analytics, an approach that utilizes AI to address the technical competency gaps that hinder end users from effectively using traditional self-service analytics. By facilitating natural language interactions, conversational business analytics aims to empower end users to independently retrieve data and generate insights. The analysis focuses on Text-to-SQL as a representative technology for translating natural language requests into SQL statements. Developing theoretical models grounded in expected utility theory, this study identifies the conditions under which conversational business analytics, through partial or full support, can outperform delegation to human experts. The results indicate that partial support, focusing solely on information generation by AI, is viable when the accuracy of AI-generated SQL queries leads to a profit that surpasses the performance of a human expert. In contrast, full support includes not only information generation but also validation through explanations provided by the AI, and requires sufficiently high validation effectiveness to be reliable. However, user-based validation presents challenges, such as misjudgment and rejection of valid SQL queries, which may limit the effectiveness of conversational business analytics. These challenges underscore the need for robust validation mechanisms, including improved user support, automated processes, and methods for assessing quality independent of the technical competency of end users.
- North America > United States (0.46)
- Europe > Switzerland (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (4 more...)
- Overview (1.00)
- Research Report > New Finding (0.93)
- Research Report > Experimental Study (0.66)
- Government (0.46)
- Health & Medicine (0.46)
Toward a Team of AI-made Scientists for Scientific Discovery from Gene Expression Data
Liu, Haoyang, Li, Yijiang, Jian, Jinglin, Cheng, Yuxuan, Lu, Jianrong, Guo, Shuyi, Zhu, Jinglei, Zhang, Mianchen, Zhang, Miantong, Wang, Haohan
Machine learning has emerged as a powerful tool for scientific discovery, enabling researchers to extract meaningful insights from complex datasets. For instance, it has facilitated the identification of disease-predictive genes from gene expression data, significantly advancing healthcare. However, the traditional process for analyzing such datasets demands substantial human effort and expertise for the data selection, processing, and analysis. To address this challenge, we introduce a novel framework, a Team of AI-made Scientists (TAIS), designed to streamline the scientific discovery pipeline. TAIS comprises simulated roles, including a project manager, data engineer, and domain expert, each represented by a Large Language Model (LLM). These roles collaborate to replicate the tasks typically performed by data scientists, with a specific focus on identifying disease-predictive genes. Furthermore, we have curated a benchmark dataset to assess TAIS's effectiveness in gene identification, demonstrating our system's potential to significantly enhance the efficiency and scope of scientific exploration. Our findings represent a solid step towards automating scientific discovery through large language models.
- North America > United States > Illinois (0.04)
- Europe > Poland > Greater Poland Province > Poznań (0.04)
- Europe > Netherlands (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Scientific Discovery (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- (2 more...)
AI-Enabled Software and System Architecture Frameworks: Focusing on smart Cyber-Physical Systems (CPS)
Moin, Armin, Badii, Atta, Günnemann, Stephan, Challenger, Moharram
Several architecture frameworks for software, systems, and enterprises have been proposed in the literature. They identified various stakeholders and defined architecture viewpoints and views to frame and address stakeholder concerns. However, the stakeholders with data science and Machine Learning (ML) related concerns, such as data scientists and data engineers, are yet to be included in existing architecture frameworks. Therefore, they failed to address the architecture viewpoints and views responsive to the concerns of the data science community. In this paper, we address this gap by establishing the architecture frameworks adapted to meet the requirements of modern applications and organizations where ML artifacts are both prevalent and crucial. In particular, we focus on ML-enabled Cyber-Physical Systems (CPSs) and propose two sets of merit criteria for their efficient development and performance assessment, namely the criteria for evaluating and benchmarking ML-enabled CPSs, and the criteria for evaluation and benchmarking of the tools intended to support users through the modeling and development pipeline. In this study, we deploy multiple empirical and qualitative research methods based on literature review and survey instruments including expert interviews and an online questionnaire. We collect, analyze, and integrate the opinions of 77 experts from more than 25 organizations in over 10 countries to devise and validate the proposed framework.
- North America > United States > New York > New York County > New York City (0.05)
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
- Europe > Belgium > Flanders > Antwerp Province > Antwerp (0.04)
- (7 more...)
- Research Report > New Finding (1.00)
- Questionnaire & Opinion Survey (1.00)
- Research Report > Experimental Study (0.93)
- Information Technology > Security & Privacy (1.00)
- Government > Military (1.00)
- Education (1.00)
- Government > Regional Government > North America Government > United States Government (0.46)
Data Engineer - Managed service at Version 1 - Edinburgh, United Kingdom
We pledge "to prove IT can make a real difference to our customer's businesses". We work hard to ensure we understand what our customers need from their technology solutions and then we deliver. We are an award-winning company who provide world class customer service; we think big and we hire great people. Version 1 are more than just another IT services company - we are leaders in implementing and supporting Oracle, Microsoft and AWS technologies. Version 1 is looking for an experienced data engineer join its Managed Services team.
Data Engineer at DAZN - Hyderabad, India
Are you an engineer who loves to make things that just work better? Do you love to work with cutting edge technologies and think about how can this run faster, be deployed quicker or fail less and deliver killer streaming applications that add business value and stick with customers? DAZN is a tech-first sport streaming platform that reaches millions of users every week. We are challenging a traditional industry and giving power back to the fans. Our new Hyderabad tech hub will be the engine that drives us forward to the future.
- Information Technology > Data Science (1.00)
- Information Technology > Artificial Intelligence (0.69)
- Information Technology > Architecture > Real Time Systems (0.48)
Sr. Data Engineer - ASM Analytics at Visa - Bengaluru, India
Visa is a world leader in digital payments, facilitating more than 215 billion payments transactions between consumers, merchants, financial institutions and government entities across more than 200 countries and territories each year. Our mission is to connect the world through the most innovative, convenient, reliable and secure payments network, enabling individuals, businesses and economies to thrive. When you join Visa, you join a culture of purpose and belonging – where your growth is priority, your identity is embraced, and the work you do matters. We believe that economies that include everyone everywhere, uplift everyone everywhere. Your work will have a direct impact on billions of people around the world – helping unlock financial access to enable the future of money movement.
- Banking & Finance (0.93)
- Information Technology (0.57)
- Information Technology > Artificial Intelligence (1.00)
- Information Technology > Data Science > Data Mining (0.81)
Machine Learning Tutorial with Python, Jupyter, KSQL and TensorFlow
When Michelangelo started, the most urgent and highest impact use cases were some very high scale problems, which led us to build around Apache Spark (for large-scale data processing and model training) and Java (for low latency, high throughput online serving). This structure worked well for production training and deployment of many models but left a lot to be desired in terms of overhead, flexibility, and ease of use, especially during early prototyping and experimentation [where Notebooks and Python shine]. Uber expanded Michelangelo "to serve any kind of Python model from any source to support other Machine Learning and Deep Learning frameworks like PyTorch and TensorFlow [instead of just using Spark for everything]." So why did Uber (and many other tech companies) build its own platform and framework-independent machine learning infrastructure? The posts How to Build and Deploy Scalable Machine Learning in Production with Apache Kafka and Using Apache Kafka to Drive Cutting-Edge Machine Learning describe the benefits of leveraging the Apache Kafka ecosystem as a central, scalable, and mission-critical nervous system. It allows real-time data ingestion, processing, model deployment, and monitoring in a reliable and scalable way. This post focuses on how the Kafka ecosystem can help solve the impedance mismatch between data scientists, data engineers, and production engineers. By leveraging it to build your own scalable machine learning infrastructure and also make your data scientists happy, you can solve the same problems for which Uber built its own ML platform, Michelangelo.